Goto

Collaborating Authors

 Lake Forest


Achieving Operational Universality through a Turing Complete Chemputer

Gahler, Daniel, Thomas, Dean, Lach, Slawomir, Cronin, Leroy

arXiv.org Artificial Intelligence

The most fundamental abstraction underlying all modern computers is the Turing Machine, that is if any modern computer can simulate a Turing Machine, an equivalence which is called Turing completeness, it is theoretically possible to achieve any task that can be algorithmically described by executing a series of discrete unit operations. In chemistry, the ability to program chemical processes is demanding because it is hard to ensure that the process can be understood at a high level of abstraction, and then reduced to practice. Herein we exploit the concept of Turing completeness applied to robotic platforms for chemistry that can be used to synthesise complex molecules through unit operations that execute chemical processes using a chemically-aware programming language, XDL. We leverage the concept of computability by computers to synthesizability of chemical compounds by automated synthesis machines. The results of an interactive demonstration of Turing completeness using the colour gamut and conditional logic are presented and examples of chemical use-cases are discussed. Over 16.7 million combinations of Red, Green, Blue (RGB) colour space were binned into 5 discrete values and measured over 10 regions of interest (ROIs), affording 78 million possible states per step and served as a proxy for conceptual, chemical space exploration. This formal description establishes a formal framework in future chemical programming languages to ensure complex logic operations are expressed and executed correctly, with the possibility of error correction, in the automated and autonomous pursuit of increasingly complex molecules.


Letters, Colors, and Words: Constructing the Ideal Building Blocks Set

Salazar, Ricardo, Jamshidi, Shahrzad

arXiv.org Artificial Intelligence

Define a building blocks set to be a collection of n cubes (each with six sides) where each side is assigned one letter and one color from a palette of m colors. We propose a novel problem of assigning letters and colors to each face so as to maximize the number of words one can spell from a chosen dataset that are either mono words, all letters have the same color, or rainbow words, all letters have unique colors. We explore this problem considering a chosen set of English words, up to six letters long, from a typical vocabulary of a US American 14 year old and explore the problem when n = 6 and m = 6, with the added restriction that each color appears exactly once on the cube. The problem is intractable, as the size of the solution space makes a brute force approach computationally infeasible. Therefore we aim to solve this problem using random search, simulated annealing, two distinct tree search approaches (greedy and best-first), and a genetic algorithm. To address this, we explore a range of optimization techniques: random search, simulated annealing, two distinct tree search methods (greedy and best-first), and a genetic algorithm. Additionally, we attempted to implement a reinforcement learning approach; however, the model failed to converge to viable solutions within the problem's constraints. Among these methods, the genetic algorithm delivered the best performance, achieving a total of 2846 mono and rainbow words.


A Brain Age Residual Biomarker (BARB): Leveraging MRI-Based Models to Detect Latent Health Conditions in U.S. Veterans

Bousquet, Arthur, Banerji, Sugata, Conneely, Mark F., Jamshidi, Shahrzad

arXiv.org Artificial Intelligence

Age prediction using brain imaging, such as MRIs, has achieved promising results, with several studies identifying the model's residual as a potential biomarker for chronic disease states. In this study, we developed a brain age predictive model using a dataset of 1,220 U.S. veterans (18--80 years) and convolutional neural networks (CNNs) trained on two-dimensional slices of axial T2-weighted fast spin-echo and T2-weighted fluid attenuated inversion recovery MRI images. The model, incorporating a degree-3 polynomial ensemble, achieved an $R^{2}$ of 0.816 on the testing set. Images were acquired at the level of the anterior commissure and the frontal horns of the lateral ventricles. Residual analysis was performed to assess its potential as a biomarker for five ICD-coded conditions: hypertension (HTN), diabetes mellitus (DM), mild traumatic brain injury (mTBI), illicit substance abuse/dependence (SAD), and alcohol abuse/dependence (AAD). Residuals grouped by the number of ICD-coded conditions demonstrated different trends that were statistically significant ($p = 0.002$), suggesting a relationship between disease states and predicted brain age. This association was particularly pronounced in patients over 49 years, where negative residuals (indicating advanced brain aging) correlated with the presence of multiple ICD codes. These findings support the potential of residuals as biomarkers for detecting latent health conditions.


NewsStories: Illustrating articles with visual summaries

Tan, Reuben, Plummer, Bryan A., Saenko, Kate, Lewis, JP, Sud, Avneesh, Leung, Thomas

arXiv.org Artificial Intelligence

Recent self-supervised approaches have used large-scale image-text datasets to learn powerful representations that transfer to many tasks without finetuning. These methods often assume that there is one-to-one correspondence between its images and their (short) captions. However, many tasks require reasoning about multiple images and long text narratives, such as describing news articles with visual summaries. Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images. In addition, unlike prior work which assumed captions have a literal relation to the image, we assume images only contain loose illustrative correspondence with the text. To explore this problem, we introduce a large-scale multimodal dataset containing over 31M articles, 22M images and 1M videos. We show that state-of-the-art image-text alignment methods are not robust to longer narratives with multiple images. Finally, we introduce an intuitive baseline that outperforms these methods on zero-shot image-set retrieval by 10% on the GoodNews dataset.